24 research outputs found

    Quantitative and evolutionary global analysis of enzyme reaction mechanisms

    Get PDF
    The most widely used classification system describing enzyme-catalysed reactions is the Enzyme Commission (EC) number. Understanding enzyme function is important for both fundamental scientific and pharmaceutical reasons. The EC classification is essentially unrelated to the reaction mechanism. In this work we address two important questions related to enzyme function diversity. First, to investigate the relationship between the reaction mechanisms as described in the MACiE (Mechanism, Annotation, and Classification in Enzymes) database and the main top-level class of the EC classification. Second, how well these enzymes biocatalysis are adapted in nature. In this thesis, we have retrieved 335 enzyme reactions from the MACiE database. We consider two ways of encoding the reaction mechanism in descriptors, and three approaches that encode only the overall chemical reaction. To proceed through my work, we first develop a basic model to cluster the enzymatic reactions. Global study of enzyme reaction mechanism may provide important insights for better understanding of the diversity of chemical reactions of enzymes. Clustering analysis in such research is very common practice. Clustering algorithms suffer from various issues, such as requiring determination of the input parameters and stopping criteria, and very often a need to specify the number of clusters in advance. Using several well known metrics, we tried to optimize the clustering outputs for each of the algorithms, with equivocal results that suggested the existence of between two and over a hundred clusters. This motivated us to design and implement our algorithm, PFClust (Parameter-Free Clustering), where no prior information is required to determine the number of cluster. The analysis highlights the structure of the enzyme overall and mechanistic reaction. This suggests that mechanistic similarity can influence approaches for function prediction and automatic annotation of newly discovered protein and gene sequences. We then develop and evaluate the method for enzyme function prediction using machine learning methods. Our results suggest that pairs of similar enzyme reactions tend to proceed by different mechanisms. The machine learning method needs only chemoinformatics descriptors as an input and is applicable for regression analysis. The last phase of this work is to test the evolution of chemical mechanisms mapped onto ancestral enzymes. This domain occurrence and abundance in modern proteins has showed that the / architecture is probably the oldest fold design. These observations have important implications for the origins of biochemistry and for exploring structure-function relationships. Over half of the known mechanisms are introduced before architectural diversification over the evolutionary time. The other halves of the mechanisms are invented gradually over the evolutionary timeline just after organismal diversification. Moreover, many common mechanisms includes fundamental building blocks of enzyme chemistry were found to be associated with the ancestral fold

    BDNF: mRNA expression in urine cells of patients with chronic kidney disease and its role in kidney function

    Get PDF
    Podocyte loss and changes to the complex morphology are major causes of chronic kidney disease (CKD). As the incidence is continuously increasing over the last decades without sufficient treatment, it is important to find predicting biomarkers. Therefore, we measured urinary mRNA levels of podocyte genes NPHS1, NPHS2, PODXL and BDNF, KIM-1, CTSL by qRT-PCR of 120 CKD patients. We showed a strong correlation between BDNF and the kidney injury marker KIM-1, which were also correlated with NPHS1, suggesting podocytes as a contributing source. In human biopsies, BDNF was localized in the cell body and major processes of podocytes. In glomeruli of diabetic nephropathy patients, we found a strong BDNF signal in the remaining podocytes. An inhibition of the BDNF receptor TrkB resulted in enhanced podocyte dedifferentiation. The knockdown of the orthologue resulted in pericardial oedema formation and lowered viability of zebrafish larvae. We found an enlarged Bowman's space, dilated glomerular capillaries, podocyte loss and an impaired glomerular filtration. We demonstrated that BDNF is essential for glomerular development, morphology and function and the expression of BDNF and KIM-1 is highly correlated in urine cells of CKD patients. Therefore, BDNF mRNA in urine cells could serve as a potential CKD biomarker

    Is EC class predictable from reaction mechanism?

    Get PDF
    We thank the Scottish Universities Life Sciences Alliance (SULSA) and the Scottish Overseas Research Student Awards Scheme of the Scottish Funding Council (SFC) for financial support.Background: We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism in descriptors, and also three approaches that encode only the overall chemical reaction. Both cross-validation and also an external test set are used. Results: The three descriptor sets encoding overall chemical transformation perform better than the two descriptions of mechanism. SVM and RF models perform comparably well; kNN is less successful. Oxidoreductases and hydrolases are relatively well predicted by all types of descriptor; isomerases are well predicted by overall reaction descriptors but not by mechanistic ones. Conclusions: Our results suggest that pairs of similar enzyme reactions tend to proceed by different mechanisms. Oxidoreductases, hydrolases, and to some extent isomerases and ligases, have clear chemical signatures, making them easier to predict than transferases and lyases. We find evidence that isomerases as a class are notably mechanistically diverse and that their one shared property, of substrate and product being isomers, can arise in various unrelated ways. The performance of the different machine learning algorithms is in line with many cheminformatics applications, with SVM and RF being roughly equally effective. kNN is less successful, given the role that non-local information plays in successful classification. We note also that, despite a lack of clarity in the literature, EC number prediction is not a single problem; the challenge of predicting protein function from available sequence data is quite different from assigning an EC classification from a cheminformatics representation of a reaction.Publisher PDFPeer reviewe

    The natural history of biocatalytic mechanisms

    Get PDF
    JBOM and NN thank the Scottish Universities Life Science Alliance (SULSA) http://www.sulsa.ac.uk/ and Scottish Funding Council (SFC) http://www.sfc.ac.uk/ for financial support. JBOM thanks the Biotechnology and Biological Sciences Research Council (BBSRC) http://www.bbsrc.ac.uk/ for financial support through grant BB/I00596X/1 and GCA the National Science Foundation (OISE-1132791) http://www.nsf.gov/ and the United States Department of Agriculture (ILLU-802-909 and ILLU-483-625) http://www.csrees.usda.gov/ for financial support. EaStCHEM http://www.eastchem.ac.uk/ provided access to the ECRF computing facility.Phylogenomic analysis of the occurrence and abundance of protein domains in proteomes has recently showed that the α/β architecture is probably the oldest fold design. This holds important implications for the origins of biochemistry. Here we explore structure-function relationships addressing the use of chemical mechanisms by ancestral enzymes. We test the hypothesis that the oldest folds used the most mechanisms. We start by tracing biocatalytic mechanisms operating in metabolic enzymes along a phylogenetic timeline of the first appearance of homologous superfamilies of protein domain structures from CATH. A total of 335 enzyme reactions were retrieved from MACiE and were mapped over fold age. We define a mechanistic step type as one of the 51 mechanistic annotations given in MACiE, and each step of each of the 335 mechanisms was described using one or more of these annotations. We find that the first two folds, the P-loop containing nucleotide triphosphate hydrolase and the NAD(P)-binding Rossmann-like homologous superfamilies, were α/β architectures responsible for introducing 35% (18/51) of the known mechanistic step types. We find that these two oldest structures in the phylogenomic analysis of protein domains introduced many mechanistic step types that were later combinatorially spread in catalytic history. The most common mechanistic step types included fundamental building blocks of enzyme chemistry: “Proton transfer,” “Bimolecular nucleophilic addition,” “Bimolecular nucleophilic substitution,” and “Unimolecular elimination by the conjugate base.” They were associated with the most ancestral fold structure typical of P-loop containing nucleotide triphosphate hydrolases. Over half of the mechanistic step types were introduced in the evolutionary timeline before the appearance of structures specific to diversified organisms, during a period of architectural diversification. The other half unfolded gradually after organismal diversification and during a period that spanned ~2 billion years of evolutionary history.Publisher PDFPeer reviewe

    The history of biocatalytic mechanisms.

    Get PDF
    <p>The heat map describes the distribution of presence (red) and absence (yellow) of mechanism step types (y-axis) over fold age (x-axis). Rows of the heat map (mechanisms) are ordered vertically according to the first appearance of the step type in time, with the oldest at the top. The row sidebars at the top of the heat map are used to describe the number of MACiE entries and CATH H-level domain structures (annotated as number of folds) appearing at each fold age, and presence of top-level EC classes that are associated with these H-level structures (see color key). The x-axis scale reflects the different <i>nd</i> values found in our dataset, arranged from the oldest on the left to the youngest on the right. Every unique <i>nd</i> value forms a separate column. The non-linear scale is defined by the number of unique <i>nd</i> values falling in each interval of <i>nd</i>. There are many distinct <i>nd</i> values between 0.0 and 0.3 found in our dataset, so the scale is expanded in this region. There are few distinct <i>nd</i> values between 0.7 and 1.0, so the scale is very condensed in that region. Geological time is taken to be approximately linear with <i>nd</i>, where <i>nd</i> = 0 represents the origin of the protein world approximately 3.8 billion years ago and <i>nd</i> = 1 corresponds to the present <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003642#pcbi.1003642-Wang1" target="_blank">[4]</a>.</p

    MACiE enzymes for purine metabolism.

    No full text
    <p>Table columns are: MACiE code, Enzyme name, EC number, Purine metabolic subnetwork <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003642#pcbi.1003642-CaetanoAnolls6" target="_blank">[41]</a>, PDB code, CATH H-level Structure, nd value and mechanistic step types.</p

    The history of biocatalytic mechanisms.

    No full text
    <p>The heat map describes the distribution of presence (red) and absence (yellow) of mechanism step types (y-axis) over fold age (x-axis). Rows of the heat map (mechanisms) are ordered vertically according to the first appearance of the step type in time, with the oldest at the top. The row sidebars at the top of the heat map are used to describe the number of MACiE entries and CATH H-level domain structures (annotated as number of folds) appearing at each fold age, and presence of top-level EC classes that are associated with these H-level structures (see color key). The x-axis scale reflects the different <i>nd</i> values found in our dataset, arranged from the oldest on the left to the youngest on the right. Every unique <i>nd</i> value forms a separate column. The non-linear scale is defined by the number of unique <i>nd</i> values falling in each interval of <i>nd</i>. There are many distinct <i>nd</i> values between 0.0 and 0.3 found in our dataset, so the scale is expanded in this region. There are few distinct <i>nd</i> values between 0.7 and 1.0, so the scale is very condensed in that region. Geological time is taken to be approximately linear with <i>nd</i>, where <i>nd</i> = 0 represents the origin of the protein world approximately 3.8 billion years ago and <i>nd</i> = 1 corresponds to the present <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003642#pcbi.1003642-Wang1" target="_blank">[4]</a>.</p

    Pattern 133, the mechanistic step types associated with CATH 3.20.20.70, Aldolase class I.

    No full text
    <p>Pattern 133, the mechanistic step types associated with CATH 3.20.20.70, Aldolase class I.</p

    Heat map representing the similarity of mechanistic step types utilised by the H-level structures.

    No full text
    <p>For this we have calculated the Jaccard similarity scores. Here the x and y axes in the plot are ordered using a hierarchical clustering algorithm in which the two most similar data points are linked together at each iteration. The colors of the heatmap represent the similarity scores where yellow suggests low or no (when 0) similarity and white (1) means that identical combinations of mechanistic steps are shared between two H-level structures. The top left corner represents the color key for the similarity scores and the distribution of the similarity scores.</p
    corecore